CRANet: a comprehensive residual attention network for intracranial aneurysm image classification

Rupture of intracranial aneurysm is the first cause of subarachnoid hemorrhage, second only to cerebral thrombosis and hypertensive cerebral hemorrhage, and the mortality rate is very high. MRI technology plays an irreplaceable role in the early detection and diagnosis of intracranial aneurysms and supports evaluating the size and structure of aneurysms. The increase in many aneurysm images, may be a massive workload for the doctors, which is likely to produce a wrong diagnosis. Therefore, we proposed a simple and effective comprehensive residual attention network (CRANet) to improve the accuracy of aneurysm detection, using a residual network to extract the features of an aneurysm. Many experiments have shown that the proposed CRANet model could detect aneurysms effectively. In addition, on the test set, the accuracy and recall rates reached 97.81% and 94%, which significantly improved the detection rate of aneurysms.

classification [4,5]. Therefore, deep learning models with automatic recognition and classification can assist imaging experts in analyzing aneurysm images more effectively.
With the development of deep learning technology, convolutional neural networks emerge endlessly, from LeNet5 [6] to AlexNet [7], and then from VGGNet [8] to Goog-leNet [9], and later ResNet [10]. The emergence of these models enables convolutional neural networks to make breakthroughs in solving image classification problems. For example, Ker et al. [11] proposed an automatic classification of histological sections of the brain and breast using Google Inception V3 convolutional neural network, which improves classification performance. Literature [12] applied 3D CNN to brain scan classification of CT and MRI images and achieved good results. CNN is a practical feature extraction method with an impressive performance in various application fields [13,14].
However, it is worth noting that the continuous deepening of the network may cause the gradient to disappear and explode during the backpropagation process, which diverges the network and affects the test results. To solve the above problems, the ResNet network came into being. The shallow information is introduced into the deep layer to ensure image information transmission by introducing the residual structure. Wang et al. [15] suggested a new metastatic cancer image classification method based on the ResNet model, which could effectively alleviate the problems of gradient explosion and gradient disappearance, therefore, significantly improving the performance of cancer diagnosis. Roy et al. [16] proposed an improved residual network based on attention, which captured the spatial features of spectral images in an end-to-end training method and used an effective feature calibration mechanism to improve classification performance. Liu et al. [17] proposed a deeply integrated network with an attention mechanism, which significantly improved the success rate of early diagnosis of glaucoma. Jafar et al. [18] constructed a hyperparameter-based approach based on ResNet and CNN structures and showed that it resulted in significant performance improvements. Qiao et al. [19] proposed a simple and effective residual learning intelligent diagnosis system for diagnosing whether the fetus had congenital heart disease. Ghaderzadeh et al. [20]. suggested a model based on a deep convolutional neural network (CNN) to distinguish acute lymphoblastic leukemia (ALL) from benign causes and achieved good classification results. Although these residual models have achieved good results, they still have some problems, such as low computational efficiency, low accuracy, and unreliable diagnosis.
Generally speaking, deep convolutional neural networks can learn deeper features of medical images and are also robust. However, most deep learning methods mainly learn features from the entire image, including tumor information, background information, and noise. This makes the extracted features poorly different and inaccurate model classification and wastes computing resources. To solve these problems, Vaswani [21] proposed a new simple network architecture, which connects the encoder and decoder in the complex recurrent neural network through the attention mechanism, and achieved great success. The attention mechanism has shown many advantages in computer vision [22][23][24][25]. It does not process the information of the entire image at once but learns certain areas of the image through the attention module and extracts more essential features. Huang et al. [26] used cross-attention and channel attention to enhance the interdependence of features in spatial and channel dimensions, respectively and improved the quality of the generated image. Some works use visual feature maps and convolution kernels to understand the decision-making of CNN in image classification tasks [27][28][29]. The SE module using channel attention is not only applied to image classification but also to semantic segmentation [30,31] and image title [32], and other fields. Roy et al. [33] introduced spatial and channel attention mechanisms to enhance meaningful features. Based on the above methods, one or two attention mechanisms are mainly added to improve the model's performance, and the impact of the position of multiple attention mechanisms in the network on the classification performance is not considered. To solve this problem, this paper studies the influence of the positional relationship of the attention mechanism in the deep residual network structure on the classification results.
Aiming at the 2-classification (T: without aneurysm and F: with aneurysm) and 4-classification (L: aneurysm diameter greater than 7 mm, S: aneurysm diameter less than 3 mm, M: aneurysm diameter between 3-7 mm, and T: without aneurysm) problems of MRA aneurysm images, the main contributions of this paper are as follows: Firstly, a deep residual neural network with a comprehensive attention mechanism is proposed, called CRANet. The contribution of each area of the image to the network is different, so a spatial attention mechanism is added to give extra attention to each area. In the process of training the network, not every feature map is helpful to the improvement of classification performance. A channel attention mechanism is added to obtain the importance of each feature map. Combining the two attention mechanisms with the ResNet model is used to improve the performance of aneurysm classification.
Secondly, the ResNet network is improved. In different positions of the network, two attention mechanisms are added. As the depth of the network increases, the feature information will be lost to different degrees. The attention mechanism can control the features in the network to improve classification performance.
Finally, many experiment results show that the model's results are better than the structure that does not use the attention mechanism or only uses part of the attention mechanism.

Datasets and preprocessing
The data set used in this article contains MRA 3-D images of 678 patients from 9 different devices. The size of the 3-D images of each patient is different, and their size varies between 448 × 448 and 696 × 768. Moreover, each 3-D image contains 128 to 152 different 2-D slices. Among the 678 patients, there are 578 patients with aneurysms and 100 regular patients without an aneurysm. Among patients with aneurysms, they can be roughly divided into three categories according to the diameter of the aneurysm, S category: the aneurysm diameter is less than 3 mm, M category: the aneurysm diameter is between 3 and 7 mm, L category: the aneurysm diameter is greater than 7 mm. This paper randomly selects 10,000 two-dimensional image slices and divides them into a training set, a validation set, and a test set, according to the ratio of 8:1:1, to improve the model's generalization ability. The image size is randomly cropped to 260 × 260 and standardized, using random horizontal flipping, vertical flipping, angle rotation (between −30 degrees and 30 degrees), and adjusting brightness, contrast, and saturation method enhance the training data. The slice image of each type of MRA aneurysm is shown in Fig. 1.

Model structure
Medical image processing is highly valued by researchers worldwide. Medical image classification is an essential research direction in medical image processing, since. It is the condition for reasonable evaluation and appropriate treatment plans for patients, and it is gradually playing an increasingly important role in the medical field. Deep learning reduces the requirement for manual extraction of features. MRA aneurysm images first use image enhancement and other techniques to increase their features. Then through the mixed attention ResNet network, they can learn helpful information autonomously so that the entire system can output the best Classification results. The experimental flow structure is shown in Fig  represents type L, the aneurysm diameter is greater than 7 mm; (c) represents type M, the aneurysm diameter is between 3 and 7 mm; and (d) represents type S, the aneurysm diameter is less than 3 mm The attention mechanism has been applied to different scenarios to solve different problems in computer vision. Zhao et al. [34] advanced a two-stage segmentation network of global and local attention modules and fully convolutional networks to solve the uneven distribution of image boundaries. Oktayet [35] and others combined the spatial attention model with the U-Net network for pancreas separation. In previous studies, only the attention mechanism was added. However, it did not pay attention to the effect of the attention mechanism's position in the network model on the classification performance. Therefore, this paper will integrate the attention mechanism applied to different network locations to observe their performance.  All experiments in this article are based on the ResNet network model, as shown in Fig. 3. Our network structure includes five-channels attention modules, five spatial attention modules, and four Block blocks. The channel attention module (CA1-5) can weigh the feature maps output by the upper layer to obtain more essential channel features. The spatial attention module (SA1-5) can enhance attention to a specific area on the feature map while suppressing background information and outside areas. The Block is an integral part of the ResNet network model. Each Block includes four convolutional layers and uses a 3 × 3 convolution kernel. There is a 1 × 1 convolutional layer between every two-Block. In addition, the residual connection structure is added to ensure that the image information will not be lost during the convolution operation. The softmax function can display the results of multi-classification in the form of probability to realize the classification of aneurysms. More details of these different modules are shown in Figs. 4, 5, and 6.
In the process of information transmission, deep neural networks may cause problems such as loss of feature information, the disappearance of gradients, gradient explosion, and network non-convergence. By introducing residual structure, these problems can be alleviated to a large extent. The deep residual network can directly pass the parameters to the subsequent layers over the middle layers, reducing the network's complexity, solving the deep network's degradation problem, and improving network performance. The residual network comprises a convolutional layer, batch normalization (BN), and the essential parts of an activation function. The convolutional layer is used for feature extraction, and the convolution kernel is set to downsample the information to reduce the amount of calculation. During the training process, the data distribution will change with the update of the training parameters, so the BN operation can be used to adaptively adjust the network  Fig. 6 Residual channel attention model parameters to solve the impact of data offset or increase. The activation function adds more nonlinearity and can fit more complex tasks. Figure 4 shows the details of the residual structure.
Compared with the network model that does not adopt the skip structure, the residual network of this structure can retain the characteristic information to the greatest extent.
If the x i dimension is different from the learning residual function dimension, a 1 × 1 convolution kernel can increase or decrease the dimension. Then, the output of the residual structure can be defined as: where, x i represents the input feature map, C ′ x represents the residual term, and C x represents the optimal mapping.

Residual spatial attention model
The residual spatial attention model can make the regions with similar characteristics enhance each other, to highlight the tumor area in the global field of vision. The specific details are shown in Fig. 5. A coefficient of [0, 1] can be obtained through the sigmoid function, through which different weights are assigned to each channel or space so that the network can assign different degrees of importance to each feature map.
We let R C×H ×W be the input of the spatial attention module, C is the number of feature maps, and H and W are expressed as height and width, respectively. The input features are processed through convolution and BN operations, and the spatial attention coefficient S is obtained using the Sigmoid function. The formula is expressed as follows: In this paper, the input feature map and the obtained spatial attention feature map are fused through the residual structure to obtain the final spatial attention feature map.

Residual channel attention model
We introduced a channel attention model to find the best feature channel, which can strengthen the attention to related channels while suppressing irrelevant channel feature maps. The specific details are shown in Fig. 6.
We let F represent the input of the channel attention model. Then, we use the maximum pooling and average pooling layers to obtain global information and compress the H × W two-dimensional matrix into n number, representing the characteristics of the matrix. FR1 consists of two fully connected layers and a ReLU layer. The channel size of the first fully connected layer is C/r(r = 2), and the channel size of the second fully connected layer is C. There is a ReLU layer between the two fully connected layers. FR2 and FR1 have the same structure. The obtained feature maps F1 and F2 are added together, and the channel attention coefficient is obtained through the Sigmoid function. (1) Then multiply the input feature map by the channel attention coefficient to give each channel a different weight, and add it to the input feature map to obtain the final channel attention map Output CA .

Evaluation methods
To better evaluate the classifier's performance, three Accuracy, recall, and F1 standards are used to assess the model's classification performance [19]. The F1 score combines precision and recall, a comprehensive assessment of CRANet, so the higher the F1the better CRANet's classification performance. The larger the recall value, the higher the sensitivity to the tumor. The F1 score ranges from 0 to 1, where 1 represents that CRANet has the best performance. The calculation formula of the Accuracy is as follows: where: TP is the number of correctly classified malignant samples, TN is the number of correctly classified benign samples, P is the number of malignant samples, and N is the number of benign samples.
We predicted not only the overall accuracy rate, but also the accuracy rate of each category. The calculation formula is as follows: where:ACC i represents the prediction accuracy rate of each category, S i represents the number of correct predictions for each category, and S represents the total number.
The recall represents the proportion of judging malignancy as malignant, and the recall calculation formula is as follows: where: FN is the number of samples that are malignant and misclassified as benign.
F1 is used to judge the valid benign rate, and the F1 calculation formula is as follows: where: precision = TP/TP + FP . FP represents the number of tumor images judged to be non-tumor images.

Experimental results
In this paper, the experimental environment is as follows: all models in this article use the PyTorch deep learning framework to train the model. The GPU is NVIDIA Tesla V100, and the video memory is 32 GB. The model hyperparameters are set as follows: the Adam algorithm is used to optimize the loss function. The deep learning model is trained with small-batch samples. The batch_size is set to 32. A fixed-step strategy is precision · recall precision + recall used to adjust the learning rate during the training process. The initial learning rate is set to 0.0001. The gamma value is 0.85. In addition, L2 regularization is added to impose penalty constraints on the weight parameters. The penalty coefficient is set to 0.0001. Then, the parameters are determined by setting different epochs. This paper performs experiments on the ResNet18, ResNet34, ResNet50, and ResNet101 network models. The optimal classification performance is obtained by setting different Epochs, and the optimal model is added to the attention model at different positions for further analysis. The experimental results are shown in Table 1.
To obtain a reliable and stable network model, this article experiments on the verification set to verify the generalization ability and the accuracy of the model. Each experiment was repeated five times to reduce the error of the experiment and ensure the reliability of the experimental results. Then the average was taken as the final result. The experimental results are shown in Fig. 7.
It can be seen from Table 1 that the ResNet18 network model performs the best, achieving an accuracy of 96.57% when the Epoch is 200. Among them, 96.57% refers  Fig. 7 that ResNet18 obtained the best classification effect among the four models, which further proves that the complexity of the model matches the MRA aneurysm image data. Therefore, this article will further study adding different attention models to different positions in the ResNet18 network model on classification performance. To verify that the spatial attention model can effectively improve the classification results, we first add a spatial attention model behind the pooling layer and then add a spatial attention mechanism behind each Block in turn.
We first validate the effect of the spatial attention model on aneurysm classification and add the spatial attention model after the pooling layer and each ''Block'' to observe the change in accuracy. We use SA1, SA1-2, SA1-3, SA1-4, and SA1-5 to represent the different positions of the spatial attention model in the network. For example, the spatial attention model added after the pooling layer is denoted by SA1, and the spatial attention model added after "Block1" is denoted by SA1-2. The experimental results are shown in Table 2. Table 2 shows the influence of the spatial attention model on the classification results. At the same time, we also studied the influence of the channel attention model on the model separately. In this comparison, we only introduce the channel attention model and add the channel attention model at different network positions to observe the accuracy change. We use CA1, CA1-2, CA1-3, CA1-4, and CA1-5 to represent the different positions of the channel attention model in the network. For example, the channel attention model added after the pooling layer is denoted as CA1, and the channel attention model added after "Block1" is denoted as CA1-2. The experimental results are shown in Table 3.
It can be seen from Tables 2 and 3 that the space and channel attention models are added to the ResNet18 network model, and the effect has been improved to a certain  extent. We combine the two to improve the network structure and call it a comprehensive residual attention network.
In this comparison, we combine two attention models to validate their performance. In Table 4, SA1 + CA1 (CA1 + SA1) means adding spatial attention model and channel attention model (channel attention model and spatial attention model) after the pooling layer, SA1-2 + CA1-2 (CA1-2 + SA1-2) indicates that the spatial attention model and the channel attention model (channel attention model and spatial attention model) are added after both the pooling layer and ''Block1'' . The experimental results are shown in Table 4.
We validate the impact of ResNet18, ResNet34, ResNet50, ResNet101, and CRANet network models on aneurysm classification performance. It can be seen from Table 4 that the improved ResNet18 network using spatial and channel attention has achieved better classification performance. We also used multiple evaluation indicators to evaluate network performance, and the experimental results are shown in Table 5 and Fig. 8.
Through a large number of experimental results, we can see that our proposed CRANet has a good effect on the detection of intracranial aneurysms. By using residual spatial attention models, different degrees of attention can be given to different spatial locations of each feature map. Similarly, residual channel attention models can give different degrees of attention to different feature maps. This allows better extraction of tumor features in the feature map, thereby improving the classification performance of aneurysm images. We have conducted experimental verifications on different network models to observe the inaccuracy changes. The detailed experimental results are shown in Table 5. The accuracy and recall rates of 97.81% and 94% were achieved in the two classification problems. The accuracy and recall rates of tumors were 92.55% and 91% for the multi-class classification.
To further verify the validity and rationality of the model, the classical models CNN [5], VGG [8], GoogleNet [9], ResNet [10], InceptionV3 [36], DenseNet [37] are compared with the CRANet model proposed in this paper. The detailed data in Table 6 show that our proposed CRANet network has achieved the best performance in aneurysm classification tasks. ResNet18 performs better than other ResNet variants, reaching 96.57% accuracy, 0.91 recall rate, and 0.92 F1 scores in 2-classification tasks. The 4-classification tasks achieved a 91.75% accuracy rate, 0.87 recall rate, and 0.88  It can be seen that spatial and channel attention can well focus on the regions with significant features, indicating that our method is an effective strategy for classification tasks. As shown in Table 6, on the aneurysm two classification and four classification problems, the proposed model improves the aneurysm classification performance. Compared with some classical models, the model proposed in this paper improves the detection performance of aneurysms to a certain extent. It has been proved by a large number of experiments that the model is very effective for the classification of aneurysms.

Conclusion and future work
This paper proposed a mixed residual attention learning model for intracranial aneurysm detection called CRANet. In addition, using the residual structure in the convolutional neural network can effectively avoid the loss of information and make the detection error smaller. The CRANet model can identify whether the image is an aneurysm image and distinguish the type of aneurysm better. CRANet to better assist doctors in diagnosing aneurysms, reduce doctors' workload, and improve doctors' work efficiency.
In future work, we will collect more aneurysm images and use a deeper network structure to learn deeper features to improve the robustness and accuracy of the model. Then an automatic segmentation network model will be designed to diagnose the structure and location of the aneurysm. In addition, this will have a positive effect on clinical practice. At the same time, it will be of great significance to human health.